Overview

Dataset statistics

Number of variables28
Number of observations4916
Missing cells2654
Missing cells (%)1.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.1 MiB
Average record size in memory224.0 B

Variable types

Categorical12
Numeric16

Warnings

movie_title has a high cardinality: 4916 distinct values High cardinality
director_name has a high cardinality: 2397 distinct values High cardinality
actor_2_name has a high cardinality: 3030 distinct values High cardinality
genres has a high cardinality: 914 distinct values High cardinality
actor_1_name has a high cardinality: 2095 distinct values High cardinality
actor_3_name has a high cardinality: 3519 distinct values High cardinality
plot_keywords has a high cardinality: 4756 distinct values High cardinality
movie_imdb_link has a high cardinality: 4916 distinct values High cardinality
country has a high cardinality: 65 distinct values High cardinality
actor_1_facebook_likes is highly correlated with cast_total_facebook_likesHigh correlation
cast_total_facebook_likes is highly correlated with actor_1_facebook_likesHigh correlation
director_name has 102 (2.1%) missing values Missing
director_facebook_likes has 102 (2.1%) missing values Missing
gross has 862 (17.5%) missing values Missing
plot_keywords has 152 (3.1%) missing values Missing
content_rating has 300 (6.1%) missing values Missing
budget has 484 (9.8%) missing values Missing
title_year has 106 (2.2%) missing values Missing
aspect_ratio has 326 (6.6%) missing values Missing
budget is highly skewed (γ1 = 25.36637236) Skewed
movie_title is uniformly distributed Uniform
actor_3_name is uniformly distributed Uniform
plot_keywords is uniformly distributed Uniform
movie_imdb_link is uniformly distributed Uniform
movie_title has unique values Unique
movie_imdb_link has unique values Unique
director_facebook_likes has 877 (17.8%) zeros Zeros
actor_3_facebook_likes has 89 (1.8%) zeros Zeros
facenumber_in_poster has 2089 (42.5%) zeros Zeros
actor_2_facebook_likes has 55 (1.1%) zeros Zeros
movie_facebook_likes has 2130 (43.3%) zeros Zeros

Reproduction

Analysis started2021-04-07 01:46:02.115881
Analysis finished2021-04-07 01:46:37.073113
Duration34.96 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

movie_title
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct4916
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size38.5 KiB
Les couloirs du temps: Les visiteurs II
 
1
True Grit
 
1
Striptease
 
1
Rio 2
 
1
Christmas with the Kranks
 
1
Other values (4911)
4911 

Length

Max length86
Median length14
Mean length15.3445891
Min length1

Characters and Unicode

Total characters75434
Distinct characters96
Distinct categories13 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4916 ?
Unique (%)100.0%

Sample

1st rowAvatar
2nd rowPirates of the Caribbean: At World's End
3rd rowSpectre
4th rowThe Dark Knight Rises
5th rowStar Wars: Episode VII - The Force Awakens
ValueCountFrequency (%)
Les couloirs du temps: Les visiteurs II1
 
< 0.1%
True Grit1
 
< 0.1%
Striptease1
 
< 0.1%
Rio 21
 
< 0.1%
Christmas with the Kranks1
 
< 0.1%
Street Fighter1
 
< 0.1%
Yentl1
 
< 0.1%
Dragonfly1
 
< 0.1%
Unnatural1
 
< 0.1%
Pinocchio1
 
< 0.1%
Other values (4906)4906
99.8%
2021-04-06T19:46:37.300884image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the1555
 
11.4%
of473
 
3.5%
a185
 
1.4%
and145
 
1.1%
in121
 
0.9%
to106
 
0.8%
2103
 
0.8%
80
 
0.6%
man66
 
0.5%
love55
 
0.4%
Other values (4905)10759
78.8%

Most occurring characters

ValueCountFrequency (%)
8732
 
11.6%
e7719
 
10.2%
a4737
 
6.3%
o4563
 
6.0%
r4048
 
5.4%
n4043
 
5.4%
i3862
 
5.1%
t3729
 
4.9%
s2934
 
3.9%
h2903
 
3.8%
Other values (86)28164
37.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter53165
70.5%
Uppercase Letter11966
 
15.9%
Space Separator8732
 
11.6%
Other Punctuation943
 
1.3%
Decimal Number517
 
0.7%
Dash Punctuation91
 
0.1%
Open Punctuation5
 
< 0.1%
Close Punctuation5
 
< 0.1%
Currency Symbol4
 
< 0.1%
Other Number2
 
< 0.1%
Other values (3)4
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
e7719
14.5%
a4737
 
8.9%
o4563
 
8.6%
r4048
 
7.6%
n4043
 
7.6%
i3862
 
7.3%
t3729
 
7.0%
s2934
 
5.5%
h2903
 
5.5%
l2475
 
4.7%
Other values (25)12152
22.9%
ValueCountFrequency (%)
T1678
14.0%
S1034
 
8.6%
M813
 
6.8%
B764
 
6.4%
D710
 
5.9%
C672
 
5.6%
A652
 
5.4%
L563
 
4.7%
H552
 
4.6%
W497
 
4.2%
Other values (17)4031
33.7%
ValueCountFrequency (%)
:366
38.8%
'229
24.3%
.145
 
15.4%
,77
 
8.2%
&61
 
6.5%
!32
 
3.4%
?16
 
1.7%
/8
 
0.8%
*5
 
0.5%
#2
 
0.2%
Other values (2)2
 
0.2%
ValueCountFrequency (%)
2145
28.0%
386
16.6%
182
15.9%
081
15.7%
435
 
6.8%
821
 
4.1%
521
 
4.1%
917
 
3.3%
715
 
2.9%
614
 
2.7%
ValueCountFrequency (%)
¢2
50.0%
$2
50.0%
ValueCountFrequency (%)
(3
60.0%
[2
40.0%
ValueCountFrequency (%)
)3
60.0%
]2
40.0%
ValueCountFrequency (%)
8732
100.0%
ValueCountFrequency (%)
-91
100.0%
ValueCountFrequency (%)
½2
100.0%
ValueCountFrequency (%)
+2
100.0%
ValueCountFrequency (%)
_1
100.0%
ValueCountFrequency (%)
°1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin65131
86.3%
Common10303
 
13.7%

Most frequent character per script

ValueCountFrequency (%)
e7719
 
11.9%
a4737
 
7.3%
o4563
 
7.0%
r4048
 
6.2%
n4043
 
6.2%
i3862
 
5.9%
t3729
 
5.7%
s2934
 
4.5%
h2903
 
4.5%
l2475
 
3.8%
Other values (52)24118
37.0%
ValueCountFrequency (%)
8732
84.8%
:366
 
3.6%
'229
 
2.2%
2145
 
1.4%
.145
 
1.4%
-91
 
0.9%
386
 
0.8%
182
 
0.8%
081
 
0.8%
,77
 
0.7%
Other values (24)269
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII75411
> 99.9%
None23
 
< 0.1%

Most frequent character per block

ValueCountFrequency (%)
8732
 
11.6%
e7719
 
10.2%
a4737
 
6.3%
o4563
 
6.1%
r4048
 
5.4%
n4043
 
5.4%
i3862
 
5.1%
t3729
 
4.9%
s2934
 
3.9%
h2903
 
3.8%
Other values (72)28141
37.3%
ValueCountFrequency (%)
é8
34.8%
¢2
 
8.7%
½2
 
8.7%
·1
 
4.3%
à1
 
4.3%
Æ1
 
4.3%
ü1
 
4.3%
è1
 
4.3%
ä1
 
4.3%
á1
 
4.3%
Other values (4)4
17.4%

color
Categorical

Distinct2
Distinct (%)< 0.1%
Missing19
Missing (%)0.4%
Memory size38.5 KiB
Color
4693 
Black and White
 
204

Length

Max length15
Median length5
Mean length5.416581581
Min length5

Characters and Unicode

Total characters26525
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowColor
2nd rowColor
3rd rowColor
4th rowColor
5th rowColor
ValueCountFrequency (%)
Color4693
95.5%
Black and White204
 
4.1%
(Missing)19
 
0.4%
2021-04-06T19:46:37.484534image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-04-06T19:46:37.540701image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
color4693
88.5%
white204
 
3.8%
and204
 
3.8%
black204
 
3.8%

Most occurring characters

ValueCountFrequency (%)
o9386
35.4%
l4897
18.5%
C4693
17.7%
r4693
17.7%
a408
 
1.5%
408
 
1.5%
B204
 
0.8%
c204
 
0.8%
k204
 
0.8%
n204
 
0.8%
Other values (6)1224
 
4.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter21016
79.2%
Uppercase Letter5101
 
19.2%
Space Separator408
 
1.5%

Most frequent character per category

ValueCountFrequency (%)
o9386
44.7%
l4897
23.3%
r4693
22.3%
a408
 
1.9%
c204
 
1.0%
k204
 
1.0%
n204
 
1.0%
d204
 
1.0%
h204
 
1.0%
i204
 
1.0%
Other values (2)408
 
1.9%
ValueCountFrequency (%)
C4693
92.0%
B204
 
4.0%
W204
 
4.0%
ValueCountFrequency (%)
408
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin26117
98.5%
Common408
 
1.5%

Most frequent character per script

ValueCountFrequency (%)
o9386
35.9%
l4897
18.8%
C4693
18.0%
r4693
18.0%
a408
 
1.6%
B204
 
0.8%
c204
 
0.8%
k204
 
0.8%
n204
 
0.8%
d204
 
0.8%
Other values (5)1020
 
3.9%
ValueCountFrequency (%)
408
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII26525
100.0%

Most frequent character per block

ValueCountFrequency (%)
o9386
35.4%
l4897
18.5%
C4693
17.7%
r4693
17.7%
a408
 
1.5%
408
 
1.5%
B204
 
0.8%
c204
 
0.8%
k204
 
0.8%
n204
 
0.8%
Other values (6)1224
 
4.6%

director_name
Categorical

HIGH CARDINALITY
MISSING

Distinct2397
Distinct (%)49.8%
Missing102
Missing (%)2.1%
Memory size38.5 KiB
Steven Spielberg
 
26
Woody Allen
 
22
Clint Eastwood
 
20
Martin Scorsese
 
20
Spike Lee
 
16
Other values (2392)
4710 

Length

Max length32
Median length13
Mean length13.0847528
Min length3

Characters and Unicode

Total characters62990
Distinct characters76
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1523 ?
Unique (%)31.6%

Sample

1st rowJames Cameron
2nd rowGore Verbinski
3rd rowSam Mendes
4th rowChristopher Nolan
5th rowDoug Walker
ValueCountFrequency (%)
Steven Spielberg26
 
0.5%
Woody Allen22
 
0.4%
Clint Eastwood20
 
0.4%
Martin Scorsese20
 
0.4%
Spike Lee16
 
0.3%
Ridley Scott16
 
0.3%
Renny Harlin15
 
0.3%
Steven Soderbergh15
 
0.3%
Oliver Stone14
 
0.3%
Tim Burton14
 
0.3%
Other values (2387)4636
94.3%
(Missing)102
 
2.1%
2021-04-06T19:46:37.758885image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
john174
 
1.7%
david144
 
1.4%
michael125
 
1.2%
james87
 
0.9%
robert83
 
0.8%
peter81
 
0.8%
richard78
 
0.8%
paul73
 
0.7%
scott62
 
0.6%
lee56
 
0.6%
Other values (2965)9048
90.4%

Most occurring characters

ValueCountFrequency (%)
e5936
 
9.4%
5197
 
8.3%
a5154
 
8.2%
n4547
 
7.2%
r4333
 
6.9%
o3684
 
5.8%
i3605
 
5.7%
l2917
 
4.6%
t2257
 
3.6%
s2040
 
3.2%
Other values (66)23320
37.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter47237
75.0%
Uppercase Letter10221
 
16.2%
Space Separator5197
 
8.3%
Other Punctuation251
 
0.4%
Dash Punctuation84
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
e5936
12.6%
a5154
10.9%
n4547
9.6%
r4333
 
9.2%
o3684
 
7.8%
i3605
 
7.6%
l2917
 
6.2%
t2257
 
4.8%
s2040
 
4.3%
h1799
 
3.8%
Other values (31)10965
23.2%
ValueCountFrequency (%)
S977
 
9.6%
J893
 
8.7%
M872
 
8.5%
R733
 
7.2%
C688
 
6.7%
B658
 
6.4%
D602
 
5.9%
A558
 
5.5%
L490
 
4.8%
P472
 
4.6%
Other values (21)3278
32.1%
ValueCountFrequency (%)
.231
92.0%
'20
 
8.0%
ValueCountFrequency (%)
5197
100.0%
ValueCountFrequency (%)
-84
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin57458
91.2%
Common5532
 
8.8%

Most frequent character per script

ValueCountFrequency (%)
e5936
 
10.3%
a5154
 
9.0%
n4547
 
7.9%
r4333
 
7.5%
o3684
 
6.4%
i3605
 
6.3%
l2917
 
5.1%
t2257
 
3.9%
s2040
 
3.6%
h1799
 
3.1%
Other values (62)21186
36.9%
ValueCountFrequency (%)
5197
93.9%
.231
 
4.2%
-84
 
1.5%
'20
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII62850
99.8%
None140
 
0.2%

Most frequent character per block

ValueCountFrequency (%)
e5936
 
9.4%
5197
 
8.3%
a5154
 
8.2%
n4547
 
7.2%
r4333
 
6.9%
o3684
 
5.9%
i3605
 
5.7%
l2917
 
4.6%
t2257
 
3.6%
s2040
 
3.2%
Other values (46)23180
36.9%
ValueCountFrequency (%)
é43
30.7%
á19
13.6%
ó16
 
11.4%
ö16
 
11.4%
í8
 
5.7%
ñ7
 
5.0%
å6
 
4.3%
ç5
 
3.6%
É3
 
2.1%
ø2
 
1.4%
Other values (10)15
 
10.7%

num_critic_for_reviews
Real number (ℝ≥0)

Distinct528
Distinct (%)10.8%
Missing49
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean137.9889049
Minimum1
Maximum813
Zeros0
Zeros (%)0.0%
Memory size38.5 KiB
2021-04-06T19:46:37.877294image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile9
Q149
median108
Q3191
95-th percentile378
Maximum813
Range812
Interquartile range (IQR)142

Descriptive statistics

Standard deviation120.2393792
Coefficient of variation (CV)0.8713699067
Kurtosis3.048992494
Mean137.9889049
Median Absolute Deviation (MAD)67
Skewness1.545198025
Sum671592
Variance14457.5083
MonotocityNot monotonic
2021-04-06T19:46:37.998282image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
140
 
0.8%
536
 
0.7%
936
 
0.7%
1234
 
0.7%
834
 
0.7%
1034
 
0.7%
1633
 
0.7%
8131
 
0.6%
2930
 
0.6%
4330
 
0.6%
Other values (518)4529
92.1%
(Missing)49
 
1.0%
ValueCountFrequency (%)
140
0.8%
226
0.5%
324
0.5%
429
0.6%
536
0.7%
ValueCountFrequency (%)
8131
< 0.1%
7751
< 0.1%
7651
< 0.1%
7501
< 0.1%
7391
< 0.1%

duration
Real number (ℝ≥0)

Distinct191
Distinct (%)3.9%
Missing15
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean107.0907978
Minimum7
Maximum511
Zeros0
Zeros (%)0.0%
Memory size38.5 KiB
2021-04-06T19:46:38.123026image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile81
Q193
median103
Q3118
95-th percentile146
Maximum511
Range504
Interquartile range (IQR)25

Descriptive statistics

Standard deviation25.28601531
Coefficient of variation (CV)0.2361175361
Kurtosis22.79584339
Mean107.0907978
Median Absolute Deviation (MAD)12
Skewness2.357977091
Sum524852
Variance639.3825704
MonotocityNot monotonic
2021-04-06T19:46:38.248300image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
90160
 
3.3%
100137
 
2.8%
98135
 
2.7%
101132
 
2.7%
97131
 
2.7%
93125
 
2.5%
99123
 
2.5%
94122
 
2.5%
95121
 
2.5%
96111
 
2.3%
Other values (181)3604
73.3%
ValueCountFrequency (%)
72
 
< 0.1%
111
 
< 0.1%
141
 
< 0.1%
201
 
< 0.1%
227
0.1%
ValueCountFrequency (%)
5111
< 0.1%
3341
< 0.1%
3301
< 0.1%
3251
< 0.1%
3001
< 0.1%

director_facebook_likes
Real number (ℝ≥0)

MISSING
ZEROS

Distinct435
Distinct (%)9.0%
Missing102
Missing (%)2.1%
Infinite0
Infinite (%)0.0%
Mean691.0145409
Minimum0
Maximum23000
Zeros877
Zeros (%)17.8%
Memory size38.5 KiB
2021-04-06T19:46:38.365069image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q17
median48
Q3189.75
95-th percentile982.45
Maximum23000
Range23000
Interquartile range (IQR)182.75

Descriptive statistics

Standard deviation2832.954125
Coefficient of variation (CV)4.099702621
Kurtosis26.97306552
Mean691.0145409
Median Absolute Deviation (MAD)48
Skewness5.205766151
Sum3326544
Variance8025629.073
MonotocityNot monotonic
2021-04-06T19:46:38.492297image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0877
 
17.8%
369
 
1.4%
666
 
1.3%
763
 
1.3%
263
 
1.3%
460
 
1.2%
1157
 
1.2%
1053
 
1.1%
552
 
1.1%
851
 
1.0%
Other values (425)3403
69.2%
(Missing)102
 
2.1%
ValueCountFrequency (%)
0877
17.8%
263
 
1.3%
369
 
1.4%
460
 
1.2%
552
 
1.1%
ValueCountFrequency (%)
230001
 
< 0.1%
220008
0.2%
2100010
0.2%
200001
 
< 0.1%
180004
 
0.1%

actor_3_facebook_likes
Real number (ℝ≥0)

ZEROS

Distinct906
Distinct (%)18.5%
Missing23
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean631.2763131
Minimum0
Maximum23000
Zeros89
Zeros (%)1.8%
Memory size38.5 KiB
2021-04-06T19:46:38.613395image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile9
Q1132
median366
Q3633
95-th percentile1000
Maximum23000
Range23000
Interquartile range (IQR)501

Descriptive statistics

Standard deviation1625.874802
Coefficient of variation (CV)2.575535891
Kurtosis63.57766761
Mean631.2763131
Median Absolute Deviation (MAD)246
Skewness7.441519978
Sum3088835
Variance2643468.87
MonotocityNot monotonic
2021-04-06T19:46:38.735689image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1000118
 
2.4%
089
 
1.8%
1100027
 
0.5%
200026
 
0.5%
326
 
0.5%
300024
 
0.5%
421
 
0.4%
82621
 
0.4%
721
 
0.4%
220
 
0.4%
Other values (896)4500
91.5%
(Missing)23
 
0.5%
ValueCountFrequency (%)
089
1.8%
220
 
0.4%
326
 
0.5%
421
 
0.4%
518
 
0.4%
ValueCountFrequency (%)
230002
< 0.1%
200001
 
< 0.1%
190004
0.1%
170001
 
< 0.1%
160003
0.1%

actor_2_name
Categorical

HIGH CARDINALITY

Distinct3030
Distinct (%)61.8%
Missing13
Missing (%)0.3%
Memory size38.5 KiB
Morgan Freeman
 
18
Charlize Theron
 
14
Brad Pitt
 
13
Meryl Streep
 
11
Adam Sandler
 
10
Other values (3025)
4837 

Length

Max length28
Median length13
Mean length13.07526004
Min length3

Characters and Unicode

Total characters64108
Distinct characters80
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2125 ?
Unique (%)43.3%

Sample

1st rowJoel David Moore
2nd rowOrlando Bloom
3rd rowRory Kinnear
4th rowChristian Bale
5th rowRob Walker
ValueCountFrequency (%)
Morgan Freeman18
 
0.4%
Charlize Theron14
 
0.3%
Brad Pitt13
 
0.3%
Meryl Streep11
 
0.2%
Adam Sandler10
 
0.2%
James Franco10
 
0.2%
Will Ferrell9
 
0.2%
Scott Glenn9
 
0.2%
Bruce Willis9
 
0.2%
Jada Pinkett Smith8
 
0.2%
Other values (3020)4792
97.5%
(Missing)13
 
0.3%
2021-04-06T19:46:38.995822image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
michael102
 
1.0%
david58
 
0.6%
john55
 
0.5%
james52
 
0.5%
scott51
 
0.5%
tom50
 
0.5%
jason41
 
0.4%
robert41
 
0.4%
kevin40
 
0.4%
bruce39
 
0.4%
Other values (3823)9614
94.8%

Most occurring characters

ValueCountFrequency (%)
e6048
 
9.4%
a5789
 
9.0%
5240
 
8.2%
n4623
 
7.2%
r4295
 
6.7%
i3930
 
6.1%
o3553
 
5.5%
l3329
 
5.2%
t2287
 
3.6%
s2106
 
3.3%
Other values (70)22908
35.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter48195
75.2%
Uppercase Letter10422
 
16.3%
Space Separator5240
 
8.2%
Other Punctuation182
 
0.3%
Dash Punctuation63
 
0.1%
Decimal Number6
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
e6048
12.5%
a5789
12.0%
n4623
9.6%
r4295
8.9%
i3930
 
8.2%
o3553
 
7.4%
l3329
 
6.9%
t2287
 
4.7%
s2106
 
4.4%
h1761
 
3.7%
Other values (38)10474
21.7%
ValueCountFrequency (%)
M972
 
9.3%
S799
 
7.7%
C793
 
7.6%
B761
 
7.3%
J750
 
7.2%
D646
 
6.2%
A625
 
6.0%
R580
 
5.6%
L501
 
4.8%
T448
 
4.3%
Other values (16)3547
34.0%
ValueCountFrequency (%)
.119
65.4%
'63
34.6%
ValueCountFrequency (%)
53
50.0%
03
50.0%
ValueCountFrequency (%)
5240
100.0%
ValueCountFrequency (%)
-63
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin58617
91.4%
Common5491
 
8.6%

Most frequent character per script

ValueCountFrequency (%)
e6048
 
10.3%
a5789
 
9.9%
n4623
 
7.9%
r4295
 
7.3%
i3930
 
6.7%
o3553
 
6.1%
l3329
 
5.7%
t2287
 
3.9%
s2106
 
3.6%
h1761
 
3.0%
Other values (64)20896
35.6%
ValueCountFrequency (%)
5240
95.4%
.119
 
2.2%
-63
 
1.1%
'63
 
1.1%
53
 
0.1%
03
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII63989
99.8%
None119
 
0.2%

Most frequent character per block

ValueCountFrequency (%)
e6048
 
9.5%
a5789
 
9.0%
5240
 
8.2%
n4623
 
7.2%
r4295
 
6.7%
i3930
 
6.1%
o3553
 
5.6%
l3329
 
5.2%
t2287
 
3.6%
s2106
 
3.3%
Other values (48)22789
35.6%
ValueCountFrequency (%)
é43
36.1%
í12
 
10.1%
á10
 
8.4%
ë8
 
6.7%
ø6
 
5.0%
ó6
 
5.0%
å4
 
3.4%
ü4
 
3.4%
ç3
 
2.5%
û3
 
2.5%
Other values (12)20
16.8%

actor_1_facebook_likes
Real number (ℝ≥0)

HIGH CORRELATION

Distinct877
Distinct (%)17.9%
Missing7
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean6494.488491
Minimum0
Maximum640000
Zeros26
Zeros (%)0.5%
Memory size38.5 KiB
2021-04-06T19:46:39.105670image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile93
Q1607
median982
Q311000
95-th percentile23000
Maximum640000
Range640000
Interquartile range (IQR)10393

Descriptive statistics

Standard deviation15106.98688
Coefficient of variation (CV)2.326124206
Kurtosis685.6853809
Mean6494.488491
Median Absolute Deviation (MAD)738
Skewness19.27602317
Sum31881444
Variance228221052.7
MonotocityNot monotonic
2021-04-06T19:46:39.225364image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1000436
 
8.9%
11000206
 
4.2%
2000189
 
3.8%
3000150
 
3.1%
12000131
 
2.7%
13000123
 
2.5%
14000120
 
2.4%
10000109
 
2.2%
18000106
 
2.2%
2200080
 
1.6%
Other values (867)3259
66.3%
ValueCountFrequency (%)
026
0.5%
28
 
0.2%
34
 
0.1%
42
 
< 0.1%
57
 
0.1%
ValueCountFrequency (%)
6400001
 
< 0.1%
2600003
 
0.1%
1640002
 
< 0.1%
1370002
 
< 0.1%
870008
0.2%

gross
Real number (ℝ≥0)

MISSING

Distinct4033
Distinct (%)99.5%
Missing862
Missing (%)17.5%
Infinite0
Infinite (%)0.0%
Mean47644514.53
Minimum162
Maximum760505847
Zeros0
Zeros (%)0.0%
Memory size38.5 KiB
2021-04-06T19:46:39.364926image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum162
5-th percentile96209.7
Q15019656.25
median25043962
Q361108412.75
95-th percentile177424688.4
Maximum760505847
Range760505685
Interquartile range (IQR)56088756.5

Descriptive statistics

Standard deviation67372553.83
Coefficient of variation (CV)1.41406738
Kurtosis14.93226526
Mean47644514.53
Median Absolute Deviation (MAD)22912754.5
Skewness3.12698398
Sum1.931508619 × 1011
Variance4.53906101 × 1015
MonotocityNot monotonic
2021-04-06T19:46:39.490102image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30000003
 
0.1%
80000003
 
0.1%
10000002
 
< 0.1%
304000002
 
< 0.1%
362000002
 
< 0.1%
320000002
 
< 0.1%
264000002
 
< 0.1%
250000002
 
< 0.1%
764000002
 
< 0.1%
8000002
 
< 0.1%
Other values (4023)4032
82.0%
(Missing)862
 
17.5%
ValueCountFrequency (%)
1621
< 0.1%
7031
< 0.1%
7211
< 0.1%
8281
< 0.1%
11111
< 0.1%
ValueCountFrequency (%)
7605058471
< 0.1%
6586723021
< 0.1%
6521772711
< 0.1%
6232795471
< 0.1%
5333160611
< 0.1%

genres
Categorical

HIGH CARDINALITY

Distinct914
Distinct (%)18.6%
Missing0
Missing (%)0.0%
Memory size38.5 KiB
Drama
 
233
Comedy
 
205
Comedy|Drama
 
189
Comedy|Drama|Romance
 
185
Comedy|Romance
 
157
Other values (909)
3947 

Length

Max length64
Median length20
Mean length20.28417413
Min length5

Characters and Unicode

Total characters99717
Distinct characters35
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique507 ?
Unique (%)10.3%

Sample

1st rowAction|Adventure|Fantasy|Sci-Fi
2nd rowAction|Adventure|Fantasy
3rd rowAction|Adventure|Thriller
4th rowAction|Thriller
5th rowDocumentary
ValueCountFrequency (%)
Drama233
 
4.7%
Comedy205
 
4.2%
Comedy|Drama189
 
3.8%
Comedy|Drama|Romance185
 
3.8%
Comedy|Romance157
 
3.2%
Drama|Romance150
 
3.1%
Crime|Drama|Thriller98
 
2.0%
Horror67
 
1.4%
Action|Crime|Drama|Thriller65
 
1.3%
Drama|Thriller62
 
1.3%
Other values (904)3505
71.3%
2021-04-06T19:46:39.767895image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
drama233
 
4.7%
comedy205
 
4.2%
comedy|drama189
 
3.8%
comedy|drama|romance185
 
3.8%
comedy|romance157
 
3.2%
drama|romance150
 
3.1%
crime|drama|thriller98
 
2.0%
horror67
 
1.4%
action|crime|drama|thriller65
 
1.3%
action|crime|thriller62
 
1.3%
Other values (904)3505
71.3%

Most occurring characters

ValueCountFrequency (%)
r10220
 
10.2%
|9208
 
9.2%
a8846
 
8.9%
e7738
 
7.8%
m7234
 
7.3%
i6394
 
6.4%
o6163
 
6.2%
y4550
 
4.6%
n4363
 
4.4%
t3910
 
3.9%
Other values (25)31091
31.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter75179
75.4%
Uppercase Letter14728
 
14.8%
Math Symbol9208
 
9.2%
Dash Punctuation602
 
0.6%

Most frequent character per category

ValueCountFrequency (%)
r10220
13.6%
a8846
11.8%
e7738
10.3%
m7234
9.6%
i6394
8.5%
o6163
8.2%
y4550
 
6.1%
n4363
 
5.8%
t3910
 
5.2%
l3399
 
4.5%
Other values (9)12362
16.4%
ValueCountFrequency (%)
C2715
18.4%
D2654
18.0%
A2241
15.2%
F1716
11.7%
T1365
9.3%
R1086
 
7.4%
M828
 
5.6%
S776
 
5.3%
H740
 
5.0%
W304
 
2.1%
Other values (4)303
 
2.1%
ValueCountFrequency (%)
|9208
100.0%
ValueCountFrequency (%)
-602
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin89907
90.2%
Common9810
 
9.8%

Most frequent character per script

ValueCountFrequency (%)
r10220
 
11.4%
a8846
 
9.8%
e7738
 
8.6%
m7234
 
8.0%
i6394
 
7.1%
o6163
 
6.9%
y4550
 
5.1%
n4363
 
4.9%
t3910
 
4.3%
l3399
 
3.8%
Other values (23)27090
30.1%
ValueCountFrequency (%)
|9208
93.9%
-602
 
6.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII99717
100.0%

Most frequent character per block

ValueCountFrequency (%)
r10220
 
10.2%
|9208
 
9.2%
a8846
 
8.9%
e7738
 
7.8%
m7234
 
7.3%
i6394
 
6.4%
o6163
 
6.2%
y4550
 
4.6%
n4363
 
4.4%
t3910
 
3.9%
Other values (25)31091
31.2%

actor_1_name
Categorical

HIGH CARDINALITY

Distinct2095
Distinct (%)42.7%
Missing7
Missing (%)0.1%
Memory size38.5 KiB
Robert De Niro
 
48
Johnny Depp
 
36
Nicolas Cage
 
32
J.K. Simmons
 
29
Matt Damon
 
29
Other values (2090)
4735 

Length

Max length27
Median length13
Mean length13.20228152
Min length4

Characters and Unicode

Total characters64810
Distinct characters76
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1379 ?
Unique (%)28.1%

Sample

1st rowCCH Pounder
2nd rowJohnny Depp
3rd rowChristoph Waltz
4th rowTom Hardy
5th rowDoug Walker
ValueCountFrequency (%)
Robert De Niro48
 
1.0%
Johnny Depp36
 
0.7%
Nicolas Cage32
 
0.7%
J.K. Simmons29
 
0.6%
Matt Damon29
 
0.6%
Denzel Washington29
 
0.6%
Bruce Willis28
 
0.6%
Steve Buscemi27
 
0.5%
Liam Neeson27
 
0.5%
Harrison Ford27
 
0.5%
Other values (2085)4597
93.5%
2021-04-06T19:46:40.019689image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
robert106
 
1.0%
tom90
 
0.9%
michael88
 
0.9%
jason57
 
0.6%
de56
 
0.5%
james52
 
0.5%
steve50
 
0.5%
bruce49
 
0.5%
niro48
 
0.5%
jr47
 
0.5%
Other values (2885)9539
93.7%

Most occurring characters

ValueCountFrequency (%)
e6058
 
9.3%
a5606
 
8.6%
5273
 
8.1%
n4700
 
7.3%
r4215
 
6.5%
i4140
 
6.4%
o3821
 
5.9%
l3242
 
5.0%
t2506
 
3.9%
s2281
 
3.5%
Other values (66)22968
35.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter48806
75.3%
Uppercase Letter10441
 
16.1%
Space Separator5273
 
8.1%
Other Punctuation217
 
0.3%
Dash Punctuation71
 
0.1%
Decimal Number2
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
e6058
12.4%
a5606
11.5%
n4700
9.6%
r4215
8.6%
i4140
 
8.5%
o3821
 
7.8%
l3242
 
6.6%
t2506
 
5.1%
s2281
 
4.7%
h1747
 
3.6%
Other values (32)10490
21.5%
ValueCountFrequency (%)
J918
 
8.8%
M894
 
8.6%
S831
 
8.0%
C799
 
7.7%
B729
 
7.0%
D706
 
6.8%
R617
 
5.9%
H511
 
4.9%
A496
 
4.8%
L479
 
4.6%
Other values (18)3461
33.1%
ValueCountFrequency (%)
.173
79.7%
'44
 
20.3%
ValueCountFrequency (%)
51
50.0%
01
50.0%
ValueCountFrequency (%)
5273
100.0%
ValueCountFrequency (%)
-71
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin59247
91.4%
Common5563
 
8.6%

Most frequent character per script

ValueCountFrequency (%)
e6058
 
10.2%
a5606
 
9.5%
n4700
 
7.9%
r4215
 
7.1%
i4140
 
7.0%
o3821
 
6.4%
l3242
 
5.5%
t2506
 
4.2%
s2281
 
3.8%
h1747
 
2.9%
Other values (60)20931
35.3%
ValueCountFrequency (%)
5273
94.8%
.173
 
3.1%
-71
 
1.3%
'44
 
0.8%
51
 
< 0.1%
01
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII64732
99.9%
None78
 
0.1%

Most frequent character per block

ValueCountFrequency (%)
e6058
 
9.4%
a5606
 
8.7%
5273
 
8.1%
n4700
 
7.3%
r4215
 
6.5%
i4140
 
6.4%
o3821
 
5.9%
l3242
 
5.0%
t2506
 
3.9%
s2281
 
3.5%
Other values (48)22890
35.4%
ValueCountFrequency (%)
é19
24.4%
ë14
17.9%
á7
 
9.0%
í6
 
7.7%
ç5
 
6.4%
å5
 
6.4%
ø4
 
5.1%
Ó3
 
3.8%
ô2
 
2.6%
à2
 
2.6%
Other values (8)11
14.1%

num_voted_users
Real number (ℝ≥0)

Distinct4750
Distinct (%)96.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean82644.92494
Minimum5
Maximum1689764
Zeros0
Zeros (%)0.0%
Memory size38.5 KiB
2021-04-06T19:46:40.129046image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum5
5-th percentile507.25
Q18361.75
median33132.5
Q393772.75
95-th percentile330310
Maximum1689764
Range1689759
Interquartile range (IQR)85411

Descriptive statistics

Standard deviation138322.1625
Coefficient of variation (CV)1.673692155
Kurtosis24.91998064
Mean82644.92494
Median Absolute Deviation (MAD)29810.5
Skewness4.074557576
Sum406282451
Variance1.913302065 × 1010
MonotocityNot monotonic
2021-04-06T19:46:40.247915image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
575
 
0.1%
64
 
0.1%
25413
 
0.1%
383
 
0.1%
533
 
0.1%
31193
 
0.1%
36653
 
0.1%
83
 
0.1%
1623
 
0.1%
39432
 
< 0.1%
Other values (4740)4884
99.3%
ValueCountFrequency (%)
52
< 0.1%
64
0.1%
72
< 0.1%
83
0.1%
101
 
< 0.1%
ValueCountFrequency (%)
16897641
< 0.1%
16761691
< 0.1%
14682001
< 0.1%
13474611
< 0.1%
13246801
< 0.1%

cast_total_facebook_likes
Real number (ℝ≥0)

HIGH CORRELATION

Distinct3960
Distinct (%)80.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9579.815907
Minimum0
Maximum656730
Zeros33
Zeros (%)0.7%
Memory size38.5 KiB
2021-04-06T19:46:40.368025image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile173.75
Q11394.75
median3049
Q313616.75
95-th percentile36483.75
Maximum656730
Range656730
Interquartile range (IQR)12222

Descriptive statistics

Standard deviation18164.31699
Coefficient of variation (CV)1.896102928
Kurtosis370.782513
Mean9579.815907
Median Absolute Deviation (MAD)2262.5
Skewness13.12069073
Sum47094375
Variance329942411.7
MonotocityNot monotonic
2021-04-06T19:46:40.475372image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
033
 
0.7%
57
 
0.1%
26
 
0.1%
295
 
0.1%
20205
 
0.1%
10445
 
0.1%
27304
 
0.1%
15544
 
0.1%
814
 
0.1%
19364
 
0.1%
Other values (3950)4839
98.4%
ValueCountFrequency (%)
033
0.7%
26
 
0.1%
31
 
< 0.1%
42
 
< 0.1%
57
 
0.1%
ValueCountFrequency (%)
6567301
< 0.1%
3037171
< 0.1%
2839391
< 0.1%
2635841
< 0.1%
2618181
< 0.1%

actor_3_name
Categorical

HIGH CARDINALITY
UNIFORM

Distinct3519
Distinct (%)71.9%
Missing23
Missing (%)0.5%
Memory size38.5 KiB
Steve Coogan
 
8
Stephen Root
 
7
Jon Gries
 
7
Ben Mendelsohn
 
7
Robert Duvall
 
7
Other values (3514)
4857 

Length

Max length29
Median length13
Mean length13.07888821
Min length3

Characters and Unicode

Total characters63995
Distinct characters81
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2704 ?
Unique (%)55.3%

Sample

1st rowWes Studi
2nd rowJack Davenport
3rd rowStephanie Sigman
4th rowJoseph Gordon-Levitt
5th rowPolly Walker
ValueCountFrequency (%)
Steve Coogan8
 
0.2%
Stephen Root7
 
0.1%
Jon Gries7
 
0.1%
Ben Mendelsohn7
 
0.1%
Robert Duvall7
 
0.1%
Sam Shepard7
 
0.1%
Paul Sorvino6
 
0.1%
Anne Hathaway6
 
0.1%
Lois Maxwell6
 
0.1%
Kirsten Dunst6
 
0.1%
Other values (3509)4826
98.2%
(Missing)23
 
0.5%
2021-04-06T19:46:40.745086image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
michael85
 
0.8%
john78
 
0.8%
david68
 
0.7%
james66
 
0.7%
robert45
 
0.4%
tom42
 
0.4%
kevin41
 
0.4%
paul39
 
0.4%
peter38
 
0.4%
scott36
 
0.4%
Other values (4305)9592
94.7%

Most occurring characters

ValueCountFrequency (%)
e6030
 
9.4%
a5847
 
9.1%
5237
 
8.2%
n4474
 
7.0%
r4079
 
6.4%
i3867
 
6.0%
o3490
 
5.5%
l3413
 
5.3%
t2311
 
3.6%
s2265
 
3.5%
Other values (71)22982
35.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter48035
75.1%
Uppercase Letter10417
 
16.3%
Space Separator5237
 
8.2%
Other Punctuation226
 
0.4%
Dash Punctuation78
 
0.1%
Decimal Number2
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
e6030
12.6%
a5847
12.2%
n4474
9.3%
r4079
 
8.5%
i3867
 
8.1%
o3490
 
7.3%
l3413
 
7.1%
t2311
 
4.8%
s2265
 
4.7%
h1810
 
3.8%
Other values (34)10449
21.8%
ValueCountFrequency (%)
M953
 
9.1%
S815
 
7.8%
J810
 
7.8%
B781
 
7.5%
C774
 
7.4%
D635
 
6.1%
R602
 
5.8%
A568
 
5.5%
L523
 
5.0%
K454
 
4.4%
Other values (21)3502
33.6%
ValueCountFrequency (%)
.163
72.1%
'63
 
27.9%
ValueCountFrequency (%)
51
50.0%
01
50.0%
ValueCountFrequency (%)
5237
100.0%
ValueCountFrequency (%)
-78
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin58452
91.3%
Common5543
 
8.7%

Most frequent character per script

ValueCountFrequency (%)
e6030
 
10.3%
a5847
 
10.0%
n4474
 
7.7%
r4079
 
7.0%
i3867
 
6.6%
o3490
 
6.0%
l3413
 
5.8%
t2311
 
4.0%
s2265
 
3.9%
h1810
 
3.1%
Other values (65)20866
35.7%
ValueCountFrequency (%)
5237
94.5%
.163
 
2.9%
-78
 
1.4%
'63
 
1.1%
51
 
< 0.1%
01
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII63862
99.8%
None133
 
0.2%

Most frequent character per block

ValueCountFrequency (%)
e6030
 
9.4%
a5847
 
9.2%
5237
 
8.2%
n4474
 
7.0%
r4079
 
6.4%
i3867
 
6.1%
o3490
 
5.5%
l3413
 
5.3%
t2311
 
3.6%
s2265
 
3.5%
Other values (48)22849
35.8%
ValueCountFrequency (%)
é48
36.1%
í14
 
10.5%
á13
 
9.8%
ó9
 
6.8%
ë7
 
5.3%
ü7
 
5.3%
à5
 
3.8%
è4
 
3.0%
ç3
 
2.3%
ö3
 
2.3%
Other values (13)20
15.0%

facenumber_in_poster
Real number (ℝ≥0)

ZEROS

Distinct19
Distinct (%)0.4%
Missing13
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean1.377320008
Minimum0
Maximum43
Zeros2089
Zeros (%)42.5%
Memory size38.5 KiB
2021-04-06T19:46:40.848171image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile5
Maximum43
Range43
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.023825749
Coefficient of variation (CV)1.469393995
Kurtosis52.2146141
Mean1.377320008
Median Absolute Deviation (MAD)1
Skewness4.405495913
Sum6753
Variance4.095870662
MonotocityNot monotonic
2021-04-06T19:46:40.937334image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
02089
42.5%
11224
24.9%
2702
 
14.3%
3369
 
7.5%
4198
 
4.0%
5113
 
2.3%
675
 
1.5%
748
 
1.0%
837
 
0.8%
917
 
0.3%
Other values (9)31
 
0.6%
(Missing)13
 
0.3%
ValueCountFrequency (%)
02089
42.5%
11224
24.9%
2702
 
14.3%
3369
 
7.5%
4198
 
4.0%
ValueCountFrequency (%)
431
 
< 0.1%
311
 
< 0.1%
191
 
< 0.1%
156
0.1%
141
 
< 0.1%

plot_keywords
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct4756
Distinct (%)99.8%
Missing152
Missing (%)3.1%
Memory size38.5 KiB
based on novel
 
4
one word title
 
3
two word title
 
2
after dark horrorfest
 
2
color in title
 
2
Other values (4751)
4751 

Length

Max length149
Median length50
Mean length52.44542401
Min length2

Characters and Unicode

Total characters249850
Distinct characters42
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4751 ?
Unique (%)99.7%

Sample

1st rowavatar|future|marine|native|paraplegic
2nd rowgoddess|marriage ceremony|marriage proposal|pirate|singapore
3rd rowbomb|espionage|sequel|spy|terrorist
4th rowdeception|imprisonment|lawlessness|police officer|terrorist plot
5th rowalien|american civil war|male nipple|mars|princess
ValueCountFrequency (%)
based on novel4
 
0.1%
one word title3
 
0.1%
two word title2
 
< 0.1%
after dark horrorfest2
 
< 0.1%
color in title2
 
< 0.1%
dragon|island|training|viking|village1
 
< 0.1%
box office flop|hawaii|naval|oahu hawaii|ship1
 
< 0.1%
island|sailor|storm|stranded|vacation1
 
< 0.1%
1970s|female rear nudity|formula 1|rivalry|sex with a nurse1
 
< 0.1%
ash|father|mother|pokemon|professor1
 
< 0.1%
Other values (4746)4746
96.5%
(Missing)152
 
3.1%
2021-04-06T19:46:41.200162image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
in324
 
1.8%
of215
 
1.2%
on208
 
1.2%
the187
 
1.1%
a178
 
1.0%
to174
 
1.0%
york122
 
0.7%
based105
 
0.6%
female104
 
0.6%
by97
 
0.6%
Other values (11479)15863
90.2%

Most occurring characters

ValueCountFrequency (%)
e24178
 
9.7%
a19059
 
7.6%
|18714
 
7.5%
i18265
 
7.3%
r17649
 
7.1%
t15796
 
6.3%
n15281
 
6.1%
o15103
 
6.0%
s12955
 
5.2%
12813
 
5.1%
Other values (32)80037
32.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter217006
86.9%
Math Symbol18714
 
7.5%
Space Separator12813
 
5.1%
Decimal Number1099
 
0.4%
Other Punctuation216
 
0.1%
Open Punctuation1
 
< 0.1%
Close Punctuation1
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
e24178
11.1%
a19059
 
8.8%
i18265
 
8.4%
r17649
 
8.1%
t15796
 
7.3%
n15281
 
7.0%
o15103
 
7.0%
s12955
 
6.0%
l10874
 
5.0%
c9222
 
4.2%
Other values (16)58624
27.0%
ValueCountFrequency (%)
1276
25.1%
0264
24.0%
9215
19.6%
279
 
7.2%
861
 
5.6%
547
 
4.3%
746
 
4.2%
344
 
4.0%
638
 
3.5%
429
 
2.6%
ValueCountFrequency (%)
.128
59.3%
'88
40.7%
ValueCountFrequency (%)
|18714
100.0%
ValueCountFrequency (%)
12813
100.0%
ValueCountFrequency (%)
(1
100.0%
ValueCountFrequency (%)
)1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin217006
86.9%
Common32844
 
13.1%

Most frequent character per script

ValueCountFrequency (%)
e24178
11.1%
a19059
 
8.8%
i18265
 
8.4%
r17649
 
8.1%
t15796
 
7.3%
n15281
 
7.0%
o15103
 
7.0%
s12955
 
6.0%
l10874
 
5.0%
c9222
 
4.2%
Other values (16)58624
27.0%
ValueCountFrequency (%)
|18714
57.0%
12813
39.0%
1276
 
0.8%
0264
 
0.8%
9215
 
0.7%
.128
 
0.4%
'88
 
0.3%
279
 
0.2%
861
 
0.2%
547
 
0.1%
Other values (6)159
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII249850
100.0%

Most frequent character per block

ValueCountFrequency (%)
e24178
 
9.7%
a19059
 
7.6%
|18714
 
7.5%
i18265
 
7.3%
r17649
 
7.1%
t15796
 
6.3%
n15281
 
6.1%
o15103
 
6.0%
s12955
 
5.2%
12813
 
5.1%
Other values (32)80037
32.0%

movie_imdb_link
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct4916
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size38.5 KiB
http://www.imdb.com/title/tt0385004/?ref_=fn_tt_tt_1
 
1
http://www.imdb.com/title/tt0139239/?ref_=fn_tt_tt_1
 
1
http://www.imdb.com/title/tt0099587/?ref_=fn_tt_tt_1
 
1
http://www.imdb.com/title/tt0091635/?ref_=fn_tt_tt_1
 
1
http://www.imdb.com/title/tt0080749/?ref_=fn_tt_tt_1
 
1
Other values (4911)
4911 

Length

Max length52
Median length52
Mean length52
Min length52

Characters and Unicode

Total characters255632
Distinct characters31
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4916 ?
Unique (%)100.0%

Sample

1st rowhttp://www.imdb.com/title/tt0499549/?ref_=fn_tt_tt_1
2nd rowhttp://www.imdb.com/title/tt0449088/?ref_=fn_tt_tt_1
3rd rowhttp://www.imdb.com/title/tt2379713/?ref_=fn_tt_tt_1
4th rowhttp://www.imdb.com/title/tt1345836/?ref_=fn_tt_tt_1
5th rowhttp://www.imdb.com/title/tt5289954/?ref_=fn_tt_tt_1
ValueCountFrequency (%)
http://www.imdb.com/title/tt0385004/?ref_=fn_tt_tt_11
 
< 0.1%
http://www.imdb.com/title/tt0139239/?ref_=fn_tt_tt_11
 
< 0.1%
http://www.imdb.com/title/tt0099587/?ref_=fn_tt_tt_11
 
< 0.1%
http://www.imdb.com/title/tt0091635/?ref_=fn_tt_tt_11
 
< 0.1%
http://www.imdb.com/title/tt0080749/?ref_=fn_tt_tt_11
 
< 0.1%
http://www.imdb.com/title/tt0380599/?ref_=fn_tt_tt_11
 
< 0.1%
http://www.imdb.com/title/tt1091191/?ref_=fn_tt_tt_11
 
< 0.1%
http://www.imdb.com/title/tt0923600/?ref_=fn_tt_tt_11
 
< 0.1%
http://www.imdb.com/title/tt0861689/?ref_=fn_tt_tt_11
 
< 0.1%
http://www.imdb.com/title/tt0360717/?ref_=fn_tt_tt_11
 
< 0.1%
Other values (4906)4906
99.8%
2021-04-06T19:46:41.446109image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
http://www.imdb.com/title/tt0385004/?ref_=fn_tt_tt_11
 
< 0.1%
http://www.imdb.com/title/tt0139239/?ref_=fn_tt_tt_11
 
< 0.1%
http://www.imdb.com/title/tt0099587/?ref_=fn_tt_tt_11
 
< 0.1%
http://www.imdb.com/title/tt0091635/?ref_=fn_tt_tt_11
 
< 0.1%
http://www.imdb.com/title/tt0080749/?ref_=fn_tt_tt_11
 
< 0.1%
http://www.imdb.com/title/tt0380599/?ref_=fn_tt_tt_11
 
< 0.1%
http://www.imdb.com/title/tt1091191/?ref_=fn_tt_tt_11
 
< 0.1%
http://www.imdb.com/title/tt0923600/?ref_=fn_tt_tt_11
 
< 0.1%
http://www.imdb.com/title/tt0861689/?ref_=fn_tt_tt_11
 
< 0.1%
http://www.imdb.com/title/tt0360717/?ref_=fn_tt_tt_11
 
< 0.1%
Other values (4906)4906
99.8%

Most occurring characters

ValueCountFrequency (%)
t49160
19.2%
/24580
 
9.6%
_19664
 
7.7%
w14748
 
5.8%
.9832
 
3.8%
i9832
 
3.8%
m9832
 
3.8%
e9832
 
3.8%
f9832
 
3.8%
19667
 
3.8%
Other values (21)88653
34.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter147480
57.7%
Other Punctuation44244
 
17.3%
Decimal Number39328
 
15.4%
Connector Punctuation19664
 
7.7%
Math Symbol4916
 
1.9%

Most frequent character per category

ValueCountFrequency (%)
t49160
33.3%
w14748
 
10.0%
i9832
 
6.7%
m9832
 
6.7%
e9832
 
6.7%
f9832
 
6.7%
h4916
 
3.3%
p4916
 
3.3%
d4916
 
3.3%
b4916
 
3.3%
Other values (5)24580
16.7%
ValueCountFrequency (%)
19667
24.6%
06632
16.9%
23570
 
9.1%
33158
 
8.0%
43093
 
7.9%
82848
 
7.2%
62655
 
6.8%
92652
 
6.7%
72624
 
6.7%
52429
 
6.2%
ValueCountFrequency (%)
/24580
55.6%
.9832
 
22.2%
:4916
 
11.1%
?4916
 
11.1%
ValueCountFrequency (%)
_19664
100.0%
ValueCountFrequency (%)
=4916
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin147480
57.7%
Common108152
42.3%

Most frequent character per script

ValueCountFrequency (%)
/24580
22.7%
_19664
18.2%
.9832
 
9.1%
19667
 
8.9%
06632
 
6.1%
:4916
 
4.5%
?4916
 
4.5%
=4916
 
4.5%
23570
 
3.3%
33158
 
2.9%
Other values (6)16301
15.1%
ValueCountFrequency (%)
t49160
33.3%
w14748
 
10.0%
i9832
 
6.7%
m9832
 
6.7%
e9832
 
6.7%
f9832
 
6.7%
h4916
 
3.3%
p4916
 
3.3%
d4916
 
3.3%
b4916
 
3.3%
Other values (5)24580
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII255632
100.0%

Most frequent character per block

ValueCountFrequency (%)
t49160
19.2%
/24580
 
9.6%
_19664
 
7.7%
w14748
 
5.8%
.9832
 
3.8%
i9832
 
3.8%
m9832
 
3.8%
e9832
 
3.8%
f9832
 
3.8%
19667
 
3.8%
Other values (21)88653
34.7%

num_user_for_reviews
Real number (ℝ≥0)

Distinct954
Distinct (%)19.5%
Missing21
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean267.6688458
Minimum1
Maximum5060
Zeros0
Zeros (%)0.0%
Memory size38.5 KiB
2021-04-06T19:46:41.536908image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile9
Q164
median153
Q3320.5
95-th percentile889.3
Maximum5060
Range5059
Interquartile range (IQR)256.5

Descriptive statistics

Standard deviation372.9348388
Coefficient of variation (CV)1.3932695
Kurtosis28.0147829
Mean267.6688458
Median Absolute Deviation (MAD)111
Skewness4.227610312
Sum1310239
Variance139080.394
MonotocityNot monotonic
2021-04-06T19:46:41.652899image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
149
 
1.0%
333
 
0.7%
2632
 
0.7%
232
 
0.7%
1029
 
0.6%
628
 
0.6%
5026
 
0.5%
825
 
0.5%
3225
 
0.5%
3124
 
0.5%
Other values (944)4592
93.4%
ValueCountFrequency (%)
149
1.0%
232
0.7%
333
0.7%
423
0.5%
519
 
0.4%
ValueCountFrequency (%)
50601
< 0.1%
46671
< 0.1%
41441
< 0.1%
36461
< 0.1%
35971
< 0.1%

language
Categorical

Distinct47
Distinct (%)1.0%
Missing12
Missing (%)0.2%
Memory size38.5 KiB
English
4582 
French
 
73
Spanish
 
40
Hindi
 
28
Mandarin
 
24
Other values (42)
 
157

Length

Max length10
Median length7
Mean length6.980016313
Min length4

Characters and Unicode

Total characters34230
Distinct characters43
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)0.4%

Sample

1st rowEnglish
2nd rowEnglish
3rd rowEnglish
4th rowEnglish
5th rowEnglish
ValueCountFrequency (%)
English4582
93.2%
French73
 
1.5%
Spanish40
 
0.8%
Hindi28
 
0.6%
Mandarin24
 
0.5%
German19
 
0.4%
Japanese17
 
0.3%
Russian11
 
0.2%
Cantonese11
 
0.2%
Italian11
 
0.2%
Other values (37)88
 
1.8%
(Missing)12
 
0.2%
2021-04-06T19:46:41.887363image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
english4582
93.4%
french73
 
1.5%
spanish40
 
0.8%
hindi28
 
0.6%
mandarin24
 
0.5%
german19
 
0.4%
japanese17
 
0.3%
italian11
 
0.2%
russian11
 
0.2%
cantonese11
 
0.2%
Other values (37)88
 
1.8%

Most occurring characters

ValueCountFrequency (%)
n4904
14.3%
i4781
14.0%
h4722
13.8%
s4704
13.7%
l4608
13.5%
g4600
13.4%
E4582
13.4%
a245
 
0.7%
e214
 
0.6%
r157
 
0.5%
Other values (33)713
 
2.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter29326
85.7%
Uppercase Letter4904
 
14.3%

Most frequent character per category

ValueCountFrequency (%)
n4904
16.7%
i4781
16.3%
h4722
16.1%
s4704
16.0%
l4608
15.7%
g4600
15.7%
a245
 
0.8%
e214
 
0.7%
r157
 
0.5%
c88
 
0.3%
Other values (13)303
 
1.0%
ValueCountFrequency (%)
E4582
93.4%
F74
 
1.5%
S47
 
1.0%
H34
 
0.7%
M26
 
0.5%
G20
 
0.4%
J17
 
0.3%
P16
 
0.3%
C15
 
0.3%
I15
 
0.3%
Other values (10)58
 
1.2%

Most occurring scripts

ValueCountFrequency (%)
Latin34230
100.0%

Most frequent character per script

ValueCountFrequency (%)
n4904
14.3%
i4781
14.0%
h4722
13.8%
s4704
13.7%
l4608
13.5%
g4600
13.4%
E4582
13.4%
a245
 
0.7%
e214
 
0.6%
r157
 
0.5%
Other values (33)713
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII34230
100.0%

Most frequent character per block

ValueCountFrequency (%)
n4904
14.3%
i4781
14.0%
h4722
13.8%
s4704
13.7%
l4608
13.5%
g4600
13.4%
E4582
13.4%
a245
 
0.7%
e214
 
0.6%
r157
 
0.5%
Other values (33)713
 
2.1%

country
Categorical

HIGH CARDINALITY

Distinct65
Distinct (%)1.3%
Missing5
Missing (%)0.1%
Memory size38.5 KiB
USA
3710 
UK
434 
France
 
154
Canada
 
124
Germany
 
94
Other values (60)
395 

Length

Max length20
Median length3
Mean length3.489513337
Min length2

Characters and Unicode

Total characters17137
Distinct characters47
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique28 ?
Unique (%)0.6%

Sample

1st rowUSA
2nd rowUSA
3rd rowUK
4th rowUSA
5th rowUSA
ValueCountFrequency (%)
USA3710
75.5%
UK434
 
8.8%
France154
 
3.1%
Canada124
 
2.5%
Germany94
 
1.9%
Australia53
 
1.1%
India34
 
0.7%
Spain33
 
0.7%
China28
 
0.6%
Italy23
 
0.5%
Other values (55)224
 
4.6%
2021-04-06T19:46:42.496297image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
usa3710
74.6%
uk434
 
8.7%
france154
 
3.1%
canada124
 
2.5%
germany97
 
2.0%
australia53
 
1.1%
india34
 
0.7%
spain33
 
0.7%
china28
 
0.6%
italy23
 
0.5%
Other values (63)283
 
5.7%

Most occurring characters

ValueCountFrequency (%)
U4146
24.2%
A3778
22.0%
S3776
22.0%
a1068
 
6.2%
n626
 
3.7%
K466
 
2.7%
e399
 
2.3%
r398
 
2.3%
i244
 
1.4%
d212
 
1.2%
Other values (37)2024
11.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter12826
74.8%
Lowercase Letter4249
 
24.8%
Space Separator62
 
0.4%

Most frequent character per category

ValueCountFrequency (%)
a1068
25.1%
n626
14.7%
e399
 
9.4%
r398
 
9.4%
i244
 
5.7%
d212
 
5.0%
c193
 
4.5%
l147
 
3.5%
y136
 
3.2%
m122
 
2.9%
Other values (14)704
16.6%
ValueCountFrequency (%)
U4146
32.3%
A3778
29.5%
S3776
29.4%
K466
 
3.6%
C159
 
1.2%
F155
 
1.2%
G100
 
0.8%
I81
 
0.6%
N27
 
0.2%
J22
 
0.2%
Other values (12)116
 
0.9%
ValueCountFrequency (%)
62
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin17075
99.6%
Common62
 
0.4%

Most frequent character per script

ValueCountFrequency (%)
U4146
24.3%
A3778
22.1%
S3776
22.1%
a1068
 
6.3%
n626
 
3.7%
K466
 
2.7%
e399
 
2.3%
r398
 
2.3%
i244
 
1.4%
d212
 
1.2%
Other values (36)1962
11.5%
ValueCountFrequency (%)
62
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII17137
100.0%

Most frequent character per block

ValueCountFrequency (%)
U4146
24.2%
A3778
22.0%
S3776
22.0%
a1068
 
6.2%
n626
 
3.7%
K466
 
2.7%
e399
 
2.3%
r398
 
2.3%
i244
 
1.4%
d212
 
1.2%
Other values (37)2024
11.8%

content_rating
Categorical

MISSING

Distinct18
Distinct (%)0.4%
Missing300
Missing (%)6.1%
Memory size38.5 KiB
R
2067 
PG-13
1411 
PG
686 
Not Rated
 
115
G
 
112
Other values (13)
225 

Length

Max length9
Median length2
Mean length2.807192374
Min length1

Characters and Unicode

Total characters12958
Distinct characters28
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowPG-13
2nd rowPG-13
3rd rowPG-13
4th rowPG-13
5th rowPG-13
ValueCountFrequency (%)
R2067
42.0%
PG-131411
28.7%
PG686
 
14.0%
Not Rated115
 
2.3%
G112
 
2.3%
Unrated59
 
1.2%
Approved54
 
1.1%
TV-1430
 
0.6%
TV-MA18
 
0.4%
TV-PG13
 
0.3%
Other values (8)51
 
1.0%
(Missing)300
 
6.1%
2021-04-06T19:46:42.703572image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
r2067
43.7%
pg-131411
29.8%
pg686
 
14.5%
rated115
 
2.4%
not115
 
2.4%
g112
 
2.4%
unrated59
 
1.2%
approved54
 
1.1%
tv-1430
 
0.6%
tv-ma18
 
0.4%
Other values (9)64
 
1.4%

Most occurring characters

ValueCountFrequency (%)
G2238
17.3%
R2182
16.8%
P2125
16.4%
-1491
11.5%
11448
11.2%
31411
10.9%
t289
 
2.2%
e237
 
1.8%
d237
 
1.8%
a183
 
1.4%
Other values (18)1117
8.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter6988
53.9%
Decimal Number2897
22.4%
Dash Punctuation1491
 
11.5%
Lowercase Letter1467
 
11.3%
Space Separator115
 
0.9%

Most frequent character per category

ValueCountFrequency (%)
G2238
32.0%
R2182
31.2%
P2125
30.4%
N122
 
1.7%
T73
 
1.0%
V73
 
1.0%
A72
 
1.0%
U59
 
0.8%
M23
 
0.3%
X12
 
0.2%
Other values (2)9
 
0.1%
ValueCountFrequency (%)
t289
19.7%
e237
16.2%
d237
16.2%
a183
12.5%
o169
11.5%
r113
 
7.7%
p108
 
7.4%
n59
 
4.0%
v54
 
3.7%
s18
 
1.2%
ValueCountFrequency (%)
11448
50.0%
31411
48.7%
430
 
1.0%
78
 
0.3%
ValueCountFrequency (%)
-1491
100.0%
ValueCountFrequency (%)
115
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8455
65.2%
Common4503
34.8%

Most frequent character per script

ValueCountFrequency (%)
G2238
26.5%
R2182
25.8%
P2125
25.1%
t289
 
3.4%
e237
 
2.8%
d237
 
2.8%
a183
 
2.2%
o169
 
2.0%
N122
 
1.4%
r113
 
1.3%
Other values (12)560
 
6.6%
ValueCountFrequency (%)
-1491
33.1%
11448
32.2%
31411
31.3%
115
 
2.6%
430
 
0.7%
78
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII12958
100.0%

Most frequent character per block

ValueCountFrequency (%)
G2238
17.3%
R2182
16.8%
P2125
16.4%
-1491
11.5%
11448
11.2%
31411
10.9%
t289
 
2.2%
e237
 
1.8%
d237
 
1.8%
a183
 
1.4%
Other values (18)1117
8.6%

budget
Real number (ℝ≥0)

MISSING
SKEWED

Distinct438
Distinct (%)9.9%
Missing484
Missing (%)9.8%
Infinite0
Infinite (%)0.0%
Mean36547486.03
Minimum218
Maximum4200000000
Zeros0
Zeros (%)0.0%
Memory size38.5 KiB
2021-04-06T19:46:42.819627image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum218
5-th percentile500000
Q16000000
median19850000
Q343000000
95-th percentile125000000
Maximum4200000000
Range4199999782
Interquartile range (IQR)37000000

Descriptive statistics

Standard deviation100242679.2
Coefficient of variation (CV)2.742806418
Kurtosis870.8894003
Mean36547486.03
Median Absolute Deviation (MAD)15850000
Skewness25.36637236
Sum1.619784581 × 1011
Variance1.004859474 × 1016
MonotocityNot monotonic
2021-04-06T19:46:42.936052image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20000000168
 
3.4%
25000000139
 
2.8%
15000000139
 
2.8%
30000000136
 
2.8%
10000000133
 
2.7%
40000000129
 
2.6%
35000000117
 
2.4%
5000000108
 
2.2%
5000000099
 
2.0%
1200000091
 
1.9%
Other values (428)3173
64.5%
(Missing)484
 
9.8%
ValueCountFrequency (%)
2181
< 0.1%
11001
< 0.1%
14001
< 0.1%
32501
< 0.1%
45001
< 0.1%
ValueCountFrequency (%)
42000000001
< 0.1%
25000000001
< 0.1%
24000000001
< 0.1%
21275198981
< 0.1%
11000000001
< 0.1%

title_year
Real number (ℝ≥0)

MISSING

Distinct91
Distinct (%)1.9%
Missing106
Missing (%)2.2%
Infinite0
Infinite (%)0.0%
Mean2002.447609
Minimum1916
Maximum2016
Zeros0
Zeros (%)0.0%
Memory size38.5 KiB
2021-04-06T19:46:43.050503image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1916
5-th percentile1979
Q11999
median2005
Q32011
95-th percentile2015
Maximum2016
Range100
Interquartile range (IQR)12

Descriptive statistics

Standard deviation12.45397681
Coefficient of variation (CV)0.0062193771
Kurtosis7.630278079
Mean2002.447609
Median Absolute Deviation (MAD)6
Skewness-2.320339567
Sum9631773
Variance155.1015383
MonotocityNot monotonic
2021-04-06T19:46:43.166805image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2009253
 
5.1%
2014243
 
4.9%
2006233
 
4.7%
2013231
 
4.7%
2010225
 
4.6%
2011224
 
4.6%
2008223
 
4.5%
2005216
 
4.4%
2012214
 
4.4%
2015211
 
4.3%
Other values (81)2537
51.6%
ValueCountFrequency (%)
19161
< 0.1%
19201
< 0.1%
19251
< 0.1%
19271
< 0.1%
19292
< 0.1%
ValueCountFrequency (%)
201698
2.0%
2015211
4.3%
2014243
4.9%
2013231
4.7%
2012214
4.4%

actor_2_facebook_likes
Real number (ℝ≥0)

ZEROS

Distinct917
Distinct (%)18.7%
Missing13
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean1621.923516
Minimum0
Maximum137000
Zeros55
Zeros (%)1.1%
Memory size38.5 KiB
2021-04-06T19:46:43.276072image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile25
Q1277
median593
Q3912
95-th percentile11000
Maximum137000
Range137000
Interquartile range (IQR)635

Descriptive statistics

Standard deviation4011.299523
Coefficient of variation (CV)2.473174279
Kurtosis271.6032173
Mean1621.923516
Median Absolute Deviation (MAD)318
Skewness10.25322055
Sum7952291
Variance16090523.86
MonotocityNot monotonic
2021-04-06T19:46:43.380801image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1000294
 
6.0%
11000106
 
2.2%
200097
 
2.0%
300073
 
1.5%
055
 
1.1%
1000045
 
0.9%
1300039
 
0.8%
1400038
 
0.8%
82635
 
0.7%
400033
 
0.7%
Other values (907)4088
83.2%
ValueCountFrequency (%)
055
1.1%
214
 
0.3%
313
 
0.3%
411
 
0.2%
510
 
0.2%
ValueCountFrequency (%)
1370001
 
< 0.1%
290001
 
< 0.1%
270002
 
< 0.1%
250002
 
< 0.1%
230006
0.1%

imdb_score
Real number (ℝ≥0)

Distinct78
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.437428804
Minimum1.6
Maximum9.5
Zeros0
Zeros (%)0.0%
Memory size38.5 KiB
2021-04-06T19:46:43.493341image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1.6
5-th percentile4.3
Q15.8
median6.6
Q37.2
95-th percentile8.1
Maximum9.5
Range7.9
Interquartile range (IQR)1.4

Descriptive statistics

Standard deviation1.127802092
Coefficient of variation (CV)0.1751944955
Kurtosis0.9292585715
Mean6.437428804
Median Absolute Deviation (MAD)0.7
Skewness-0.7404080151
Sum31646.4
Variance1.271937559
MonotocityNot monotonic
2021-04-06T19:46:43.612011image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6.7216
 
4.4%
6.6199
 
4.0%
7.2187
 
3.8%
6.4183
 
3.7%
6.5182
 
3.7%
7.3180
 
3.7%
6.8178
 
3.6%
7.1177
 
3.6%
7177
 
3.6%
6.3175
 
3.6%
Other values (68)3062
62.3%
ValueCountFrequency (%)
1.61
 
< 0.1%
1.71
 
< 0.1%
1.93
0.1%
22
< 0.1%
2.13
0.1%
ValueCountFrequency (%)
9.51
 
< 0.1%
9.31
 
< 0.1%
9.21
 
< 0.1%
9.12
< 0.1%
93
0.1%

aspect_ratio
Real number (ℝ≥0)

MISSING

Distinct22
Distinct (%)0.5%
Missing326
Missing (%)6.6%
Infinite0
Infinite (%)0.0%
Mean2.222348584
Minimum1.18
Maximum16
Zeros0
Zeros (%)0.0%
Memory size38.5 KiB
2021-04-06T19:46:43.723643image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1.18
5-th percentile1.66
Q11.85
median2.35
Q32.35
95-th percentile2.35
Maximum16
Range14.82
Interquartile range (IQR)0.5

Descriptive statistics

Standard deviation1.402939811
Coefficient of variation (CV)0.6312870183
Kurtosis88.33874594
Mean2.222348584
Median Absolute Deviation (MAD)0.04
Skewness9.277083835
Sum10200.58
Variance1.968240114
MonotocityNot monotonic
2021-04-06T19:46:43.811259image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
2.352283
46.4%
1.851866
38.0%
1.78108
 
2.2%
1.3799
 
2.0%
1.3366
 
1.3%
1.6663
 
1.3%
1645
 
0.9%
2.3915
 
0.3%
2.214
 
0.3%
47
 
0.1%
Other values (12)24
 
0.5%
(Missing)326
 
6.6%
ValueCountFrequency (%)
1.181
 
< 0.1%
1.21
 
< 0.1%
1.3366
1.3%
1.3799
2.0%
1.441
 
< 0.1%
ValueCountFrequency (%)
1645
0.9%
47
 
0.1%
2.763
 
0.1%
2.552
 
< 0.1%
2.43
 
0.1%

movie_facebook_likes
Real number (ℝ≥0)

ZEROS

Distinct876
Distinct (%)17.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7348.294142
Minimum0
Maximum349000
Zeros2130
Zeros (%)43.3%
Memory size38.5 KiB
2021-04-06T19:46:43.917082image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median159
Q32000
95-th percentile40000
Maximum349000
Range349000
Interquartile range (IQR)2000

Descriptive statistics

Standard deviation19206.01646
Coefficient of variation (CV)2.613670069
Kurtosis43.14957809
Mean7348.294142
Median Absolute Deviation (MAD)159
Skewness5.177099068
Sum36124214
Variance368871068.2
MonotocityNot monotonic
2021-04-06T19:46:44.032929image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02130
43.3%
1000108
 
2.2%
1100079
 
1.6%
1000076
 
1.5%
1300058
 
1.2%
1200058
 
1.2%
200056
 
1.1%
1500048
 
1.0%
1400047
 
1.0%
1600046
 
0.9%
Other values (866)2210
45.0%
ValueCountFrequency (%)
02130
43.3%
22
 
< 0.1%
31
 
< 0.1%
45
 
0.1%
52
 
< 0.1%
ValueCountFrequency (%)
3490001
< 0.1%
1990001
< 0.1%
1970001
< 0.1%
1910001
< 0.1%
1900001
< 0.1%

Interactions

2021-04-06T19:46:06.871455image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:07.014387image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:07.139043image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:07.301922image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:07.461848image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:07.578210image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:07.693982image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:07.805979image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:07.933413image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:08.093345image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:08.251465image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:08.368771image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:08.584800image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:08.713029image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:08.833524image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:08.989130image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:09.142250image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:09.288652image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:09.425628image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:09.572741image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:09.720912image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:09.871041image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:10.016675image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:10.132346image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:10.230054image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:10.326901image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:10.426612image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:10.525375image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:10.629845image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:10.753791image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:10.869148image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:10.968836image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:11.065734image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:11.160870image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:11.258758image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:11.355760image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:11.455999image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:11.556574image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:11.650078image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:11.743762image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:11.851181image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:11.959681image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:12.070846image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:12.266101image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:12.378629image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:12.471614image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:12.567370image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:12.659869image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:12.770974image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:12.887708image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:12.985679image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:13.098828image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:13.210893image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:13.303723image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:13.402730image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:13.524351image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:13.657154image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:13.807162image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:13.939293image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:14.051194image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:14.150951image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:14.295122image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:14.438398image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:14.545686image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:14.644889image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:14.755391image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:14.867722image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:14.973055image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:15.071700image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:15.176553image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:15.284982image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:15.390656image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:15.511023image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:15.621022image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:15.739016image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:15.867334image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:15.999708image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:16.120800image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:16.236253image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:16.338365image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:16.440878image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:16.685271image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:16.819415image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:16.953031image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:17.092547image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:17.204662image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:17.310994image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:17.425205image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:17.531325image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:17.645215image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:17.766634image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:17.890234image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:17.999485image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:18.105670image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:18.213720image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:18.328391image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:18.442999image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:18.554966image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:18.659259image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:18.767236image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:18.993994image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:19.164894image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:19.323356image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:19.483686image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:19.615303image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:19.733573image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:19.848459image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:19.949890image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:20.050547image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:20.156375image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:20.375065image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:20.530145image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:20.638920image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:20.753938image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:20.891934image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:21.018581image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:21.119631image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:21.294897image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:21.470095image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:21.601059image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:21.718632image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:21.825092image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:21.919444image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:22.012595image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:22.108072image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:22.209235image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:22.470712image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:22.590111image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:22.687229image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:22.783938image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:22.874937image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:22.968991image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:23.071287image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:23.171103image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:23.272728image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:23.376161image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:23.483827image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:23.583731image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:23.682702image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:23.809831image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:23.907007image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:24.001663image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:24.101077image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:24.197568image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:24.288103image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:24.381700image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:24.494689image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:24.613524image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:24.715627image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:24.819620image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:24.926729image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:25.030574image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:25.130522image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:25.229639image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:25.328026image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:25.430435image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:25.531742image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:25.637369image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:25.739135image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:25.833103image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:25.930222image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:26.026632image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:26.128471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:26.227971image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:26.325183image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:26.423094image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:26.530321image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:26.632547image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:26.733440image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:26.828838image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:26.933067image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:27.038760image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:27.148205image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:27.248950image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:27.361371image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:27.485356image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:27.578739image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:27.684562image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:27.789919image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:27.895513image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:28.004535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:28.124096image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:28.230998image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:28.540047image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:28.663870image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:28.774094image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:28.882474image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:28.997776image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:29.110500image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:29.213732image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:29.317726image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:29.421550image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:29.530888image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:29.639037image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:29.749888image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:29.865804image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:29.979917image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:30.084236image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:30.187493image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:30.292199image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:30.400173image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:30.506834image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:30.618654image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:30.729815image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:30.833201image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:30.938287image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:31.040385image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:31.146176image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:31.256189image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:31.361698image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:31.471470image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:31.581563image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:31.678950image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:31.778050image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:31.879485image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:31.987357image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:32.095180image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:32.204726image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:32.310325image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:32.408941image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:32.510094image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:32.609035image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:32.710703image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:32.818044image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:32.923035image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:33.032954image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:33.147325image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:33.253393image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:33.359240image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:33.465914image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:33.578707image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:33.691922image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:33.808775image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:33.922096image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:34.028619image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:34.137486image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:34.245082image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:34.354443image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:34.469086image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-06T19:46:34.578980image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-04-06T19:46:44.171395image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-04-06T19:46:44.386022image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-04-06T19:46:44.596407image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-04-06T19:46:44.802962image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-04-06T19:46:44.991792image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-04-06T19:46:34.846219image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-04-06T19:46:35.716476image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-04-06T19:46:36.394267image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-04-06T19:46:36.887741image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

movie_titlecolordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenresactor_1_namenum_voted_userscast_total_facebook_likesactor_3_namefacenumber_in_posterplot_keywordsmovie_imdb_linknum_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
0AvatarColorJames Cameron723.0178.00.0855.0Joel David Moore1000.0760505847.0Action|Adventure|Fantasy|Sci-FiCCH Pounder8862044834Wes Studi0.0avatar|future|marine|native|paraplegichttp://www.imdb.com/title/tt0499549/?ref_=fn_tt_tt_13054.0EnglishUSAPG-13237000000.02009.0936.07.91.7833000
1Pirates of the Caribbean: At World's EndColorGore Verbinski302.0169.0563.01000.0Orlando Bloom40000.0309404152.0Action|Adventure|FantasyJohnny Depp47122048350Jack Davenport0.0goddess|marriage ceremony|marriage proposal|pirate|singaporehttp://www.imdb.com/title/tt0449088/?ref_=fn_tt_tt_11238.0EnglishUSAPG-13300000000.02007.05000.07.12.350
2SpectreColorSam Mendes602.0148.00.0161.0Rory Kinnear11000.0200074175.0Action|Adventure|ThrillerChristoph Waltz27586811700Stephanie Sigman1.0bomb|espionage|sequel|spy|terroristhttp://www.imdb.com/title/tt2379713/?ref_=fn_tt_tt_1994.0EnglishUKPG-13245000000.02015.0393.06.82.3585000
3The Dark Knight RisesColorChristopher Nolan813.0164.022000.023000.0Christian Bale27000.0448130642.0Action|ThrillerTom Hardy1144337106759Joseph Gordon-Levitt0.0deception|imprisonment|lawlessness|police officer|terrorist plothttp://www.imdb.com/title/tt1345836/?ref_=fn_tt_tt_12701.0EnglishUSAPG-13250000000.02012.023000.08.52.35164000
4Star Wars: Episode VII - The Force AwakensNaNDoug WalkerNaNNaN131.0NaNRob Walker131.0NaNDocumentaryDoug Walker8143NaN0.0NaNhttp://www.imdb.com/title/tt5289954/?ref_=fn_tt_tt_1NaNNaNNaNNaNNaNNaN12.07.1NaN0
5John CarterColorAndrew Stanton462.0132.0475.0530.0Samantha Morton640.073058679.0Action|Adventure|Sci-FiDaryl Sabara2122041873Polly Walker1.0alien|american civil war|male nipple|mars|princesshttp://www.imdb.com/title/tt0401729/?ref_=fn_tt_tt_1738.0EnglishUSAPG-13263700000.02012.0632.06.62.3524000
6Spider-Man 3ColorSam Raimi392.0156.00.04000.0James Franco24000.0336530303.0Action|Adventure|RomanceJ.K. Simmons38305646055Kirsten Dunst0.0sandman|spider man|symbiote|venom|villainhttp://www.imdb.com/title/tt0413300/?ref_=fn_tt_tt_11902.0EnglishUSAPG-13258000000.02007.011000.06.22.350
7TangledColorNathan Greno324.0100.015.0284.0Donna Murphy799.0200807262.0Adventure|Animation|Comedy|Family|Fantasy|Musical|RomanceBrad Garrett2948102036M.C. Gainey1.017th century|based on fairy tale|disney|flower|towerhttp://www.imdb.com/title/tt0398286/?ref_=fn_tt_tt_1387.0EnglishUSAPG260000000.02010.0553.07.81.8529000
8Avengers: Age of UltronColorJoss Whedon635.0141.00.019000.0Robert Downey Jr.26000.0458991599.0Action|Adventure|Sci-FiChris Hemsworth46266992000Scarlett Johansson4.0artificial intelligence|based on comic book|captain america|marvel cinematic universe|superherohttp://www.imdb.com/title/tt2395427/?ref_=fn_tt_tt_11117.0EnglishUSAPG-13250000000.02015.021000.07.52.35118000
9Harry Potter and the Half-Blood PrinceColorDavid Yates375.0153.0282.010000.0Daniel Radcliffe25000.0301956980.0Adventure|Family|Fantasy|MysteryAlan Rickman32179558753Rupert Grint3.0blood|book|love|potion|professorhttp://www.imdb.com/title/tt0417741/?ref_=fn_tt_tt_1973.0EnglishUKPG250000000.02009.011000.07.52.3510000

Last rows

movie_titlecolordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenresactor_1_namenum_voted_userscast_total_facebook_likesactor_3_namefacenumber_in_posterplot_keywordsmovie_imdb_linknum_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
4906PrimerColorShane Carruth143.077.0291.08.0David Sullivan291.0424760.0Drama|Sci-Fi|ThrillerShane Carruth72639368Casey Gooden0.0changing the future|independent film|invention|nonlinear timeline|time travelhttp://www.imdb.com/title/tt0390384/?ref_=fn_tt_tt_1371.0EnglishUSAPG-137000.02004.045.07.01.8519000
4907CaviteColorNeill Dela Llana35.080.00.00.0Edgar Tancangco0.070071.0ThrillerIan Gamazon5890Quynn Ton0.0jihad|mindanao|philippines|security guard|squatterhttp://www.imdb.com/title/tt0428303/?ref_=fn_tt_tt_135.0EnglishPhilippinesNot Rated7000.02005.00.06.3NaN74
4908El MariachiColorRobert Rodriguez56.081.00.06.0Peter Marquardt121.02040920.0Action|Crime|Drama|Romance|ThrillerCarlos Gallardo52055147Consuelo Gómez0.0assassin|death|guitar|gun|mariachihttp://www.imdb.com/title/tt0104815/?ref_=fn_tt_tt_1130.0SpanishUSAR7000.01992.020.06.91.370
4909The Mongol KingColorAnthony ValloneNaN84.02.02.0John Considine45.0NaNCrime|DramaRichard Jewell3693Sara Stepnicka0.0jewell|mongol|nostradamus|stepnicka|vallonehttp://www.imdb.com/title/tt0430371/?ref_=fn_tt_tt_11.0EnglishUSAPG-133250.02005.044.07.8NaN4
4910NewlywedsColorEdward Burns14.095.00.0133.0Caitlin FitzGerald296.04584.0Comedy|DramaKerry Bishé1338690Daniella Pineda1.0written and directed by cast memberhttp://www.imdb.com/title/tt1880418/?ref_=fn_tt_tt_114.0EnglishUSANot Rated9000.02011.0205.06.4NaN413
4911Signed Sealed DeliveredColorScott Smith1.087.02.0318.0Daphne Zuniga637.0NaNComedy|DramaEric Mabius6292283Crystal Lowe2.0fraud|postal worker|prison|theft|trialhttp://www.imdb.com/title/tt3000844/?ref_=fn_tt_tt_16.0EnglishCanadaNaNNaN2013.0470.07.7NaN84
4912The FollowingColorNaN43.043.0NaN319.0Valorie Curry841.0NaNCrime|Drama|Mystery|ThrillerNatalie Zea738391753Sam Underwood1.0cult|fbi|hideout|prison escape|serial killerhttp://www.imdb.com/title/tt2071645/?ref_=fn_tt_tt_1359.0EnglishUSATV-14NaNNaN593.07.516.0032000
4913A Plague So PleasantColorBenjamin Roberds13.076.00.00.0Maxwell Moody0.0NaNDrama|Horror|ThrillerEva Boehnke380David Chandler0.0NaNhttp://www.imdb.com/title/tt2107644/?ref_=fn_tt_tt_13.0EnglishUSANaN1400.02013.00.06.3NaN16
4914Shanghai CallingColorDaniel Hsia14.0100.00.0489.0Daniel Henney946.010443.0Comedy|Drama|RomanceAlan Ruck12552386Eliza Coupe5.0NaNhttp://www.imdb.com/title/tt2070597/?ref_=fn_tt_tt_19.0EnglishUSAPG-13NaN2012.0719.06.32.35660
4915My Date with DrewColorJon Gunn43.090.016.016.0Brian Herzlinger86.085222.0DocumentaryJohn August4285163Jon Gunn0.0actress name in title|crush|date|four word title|video camerahttp://www.imdb.com/title/tt0378407/?ref_=fn_tt_tt_184.0EnglishUSAPG1100.02004.023.06.61.85456